Search CORE

84 research outputs found

Comparing SPHINX vs. SONIC Italian Children Speech Recognition Systems

Author: Cosi Piero
Nicolao Mauro
Publication venue: Bulzoni
Publication date
Field of study

Our previous experiences have showed that both CSLR SONIC and CMU SPHINX are two versatile and powerful tools for Automatic Speech Recognition (ASR). Encouraged by the good results we had, these two systems have been compared in another important challenge of ASR: the recognition of children\u27s speech. In this work, SPHINX has been used to build from scratch a recognizer for Italian children\u27s speech and the results have been compared to those obtained with SONIC, both in previous and in some new experiments, which were designed in order to have uniform experimental conditions between the two different systems. This report describes the training process and the evaluation methodology regarding a speaker-independent phonetic-recognition task. First, we briefly describe the system architectures and their differences, and then we analyze the task, the corpus and the techniques adopted to face the recognition problem. The scores of multiple tests in terms of Phonetic Error Rate (PER) and an analysis on differences of the two systems are shown in the final discussion. SONIC has turned out to have the best overall performance and it obtained a minimum PER of 12.4% with VTLN and SMAPLR adaptation. SPHINX was the easiest system to train and test and its performance (PER of 17.2% with comparable adaptations) was only some percentage points far from those in SONIC

PUblication MAnagement

Evalita-Istc Comparison Of Open Source Tools On Clean And Noisy Digits Recognition Tasks

Author: Cosi Piero
Nicolao Mauro
Publication venue: EDK Editore
Publication date
Field of study

EVALITA is a recent initiative devoted to the evaluation of Natural Language and Speech Processing tools for Italian. The general objective of EVALITA is to promote the development of language and speech technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent manner. In this work the results of the evaluation of three open source ASR toolkits (CSLU Speech Tools, CSLR SONIC, SPHINX) working on the EVALITA clean and noisy digits recognition task will be described together with the complete evaluation methodology

PUblication MAnagement

FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches

Author: Cosi Piero
Cutugno Francesco
Origlia Antonio
Rodà Antonio
Zmarich Claudio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

A FACIAL ANIMATION FRAMEWORK WITH EMOTIVE/EXPRESSIVE CAPABILITIES

Author: Cosi Piero
Leone Giuseppe Riccardo
Publication venue: IADIS
Publication date
Field of study

LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR.. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was build from real human data physically extracted by ELITE optotracking movement analyzer. LUCIA can copy a real human by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing a true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA\u27s voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two different versions: an open source framework and the "work in progress" WebG

PUblication MAnagement

A 3d talking head for mobile devices based on unofficial ios webgl support

Author: Benin Alberto
Cosi Piero
Leone Giuseppe Riccardo
Publication venue: ACM New York, NY, USA
Publication date: 01/01/2012
Field of study

In this paper we present the implementation of a WebGL Talking Head for iOS mobile devices (Apple iPhone and iPad). It works on standard MPEG-4 Facial Animation Parameters (FAPs) and speaks with the Italian version of FESTIVAL TTS. It is totally based on true real human data. The 3D kinematics information are used to create lips articulatory model and to drive directly the talking face, generating human facial movements. In the last year we developed the WebGL version of the avatar. WebGL, which is 3D graphic for the web, is currently supported in the major web browsers for desktop computers. No official support has been given for mobile device main platforms yet, although the Firefox beta version enables it on android phones. Starting from iOS 5 WebGL is enabled only for the advertisement library class (which is intended for placing ad-banners in applications). We have been able to use this feature to visualize and animate our WebGL talking head

PUblication MAnagement

LUCIA: An open source 3D expressive avatar for multimodal h.m.i.

Author: Cosi Piero
Leone Giuseppe Riccardo
Paci Giulio
Publication venue: ICST
Publication date
Field of study

LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR . It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was build from real human data physically extracted by ELITE optotracking movement analyzer. LUCIA can copy a real human by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing a true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA\u27s voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two dif-ferent versions: an open source framework and the "work in progress" WebGL

PUblication MAnagement

Due tecniche di vocoding per la sintesi di parlato emotivo mediante trasformazione del timbro vocale

Author: Cosi Piero
Nicolao Mauro
Tesser Fabio
Zovato Enrico
Publication venue: Bulzoni, Editore
Publication date
Field of study

In questo articolo vengono descritte due tecniche di modifica del timbro vocale utilizzate in un esperimento di trasformazione della voce con l\u27obiettivo di riprodurre alcune caratteristiche del parlato emotivo. Il segnale vocale emesso da un parlatore con stile di lettura neutro viene convertito in modo da riprodurre l\u27inviluppo spettrale utilizzato dallo stesso parlatore in una situazione emotiva non neutra. La funzione di conversione tra gli inviluppi spettrali ? calcolata utilizzando un metodo ricavato con un addestramento su dati reali. Per questo motivo ? stato preso in considerazione un database contenente la voce di un parlatore registrato durante la lettura/recitazione di un corpus di testi con diversi stili emozionali: allegro, triste e uno stile neutro di riferimento. Le due tecniche di generazione della forma d\u27onda (vocoding) prese in considerazione sono il Phase Vocoder e il filtro MLSA (Mel Log Spectrum Approximation). I due prototipi implementati sono stati valutati con test di tipo percettivo, mentre valutazioni oggettive hanno convalidato l\u27efficacia della funzione di conversione

PUblication MAnagement

Analisi gerarchica degli inviluppi spettrali differenziali di una voce emotiva

Author: Cosi Piero
Salvi Giampiero
Tesser Fabio
Zovato Enrico
Publication venue: Bulzoni
Publication date
Field of study

.In questo articolo viene descritto un nuovo metodo di analisi del timbro vocale tramite lo studio delle variazioni di inviluppo spettrale utilizzato da uno stesso parlatore in situazioni emotiva neutra o espressiva. Il contesto dell\u27analisi riguarda un corpus di un solo parlatore istruito a leggere una serie di frasi utilizzando uno stile di lettura neutro e successivamente utilizzando due modalit? emotive: uno stile allegro e uno stile triste. Gli inviluppi spettrali relativi alle versioni allineate delle realizzazioni vocali neutre e espressive (allegra e triste) sono confrontati utilizzando un metodo differenziale. Le differenze sono state calcolate tra lo stato emotivo e quello neutro, di conseguenza le due categorie messe a confronto sono neutro-allegro e neutro-triste. La statistica degli inviluppi differenziali ? stata calcolata per ogni fono. I dati sono stati esaminati utilizzando un metodo di clustering gerarchico di tipo agglomerativo. I cluster risultanti sono avvalorati con diverse misure di distanza tra le distribuzioni statistiche ed esplorati visivamente per trovare similitudini e differenze tra le due categorie. I risultati mettono in evidenza sistematiche variazioni nel timbro vocale relative ai due insiemi di differenze di inviluppi spettrali. Questi tratti dipendono dalla valenza dell\u27emozione presa in considerazione (positiva, negativa) come dalle propriet? fonetiche del particolare fono come ad esempio sonorit? e luogo di articolazione

PUblication MAnagement

Cluster Analysis of Differential Spectral Envelopes on Emotional Speech

Author: Cosi Piero
Salvi Giampiero
Tesser Fabio
Zovato Enrico
Publication venue: ISCA-INST SPEECH COMMUNICATION ASSOCIATION
Publication date
Field of study

This paper reports on the analysis of the spectral variation of emotional speech. Spectral envelopes of time aligned speech frames are compared between emotionally neutral and active utterances. Statistics are computed over the resulting differential spectral envelopes for each phoneme. Finally, these statistics are classified using agglomerative hierarchical clustering and a measure of dissimilarity between statistical distributions and the resulting clusters are analysed. The results show that there are systematic changes in spectral envelopes when going from neutral to sad or happy speech, and those changes depend on the valence of the emotional content (negative, positive) as well as on the phonetic properties of the sounds such as voicing and place of articulation

PUblication MAnagement